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Effective connectivity analysis provides an understanding of the 
functional organization of the brain by studying how activated regions 
influence one other. We propose a nonparametric Bayesian approach 
to model effective connectivity assuming a dynamic nonstationary 
Q, ^ neuronal system. Our approach uses the Dirichlet process to specify 

.^^ ■ an appropriate (most plausible according to our prior beliefs) dy- 

. ' namic model as the "expectation" of a set of plausible models upon 

j^ , which we assign a probability distribution. This addresses model un- 

certainty associated with dynamic effective connectivity. We derive 
a Gibbs sampling approach to sample from the joint (and marginal) 
posterior distributions of the unknowns. Results on simulation exper- 
iments demonstrate our model to be flexible and a better candidate 
^ ' in many situations. We also used our approach to analyzing func- 

tional Magnetic Resonance Imaging (fMRI) data on a Stroop task: 
our analysis provided new insight into the mechanism by which an 
individual brain distinguishes and learns about shapes of objects. 



(^ ' 1. Introduction. Functional magnetic resonance imaging (fMRI) is a non- 

invasive technique for detecting regions in the brain that are activated by the 
apphcation of a stimulus or the performance of a task. Although important 
neuronal activities are responsible for such activation, these are very subtle 

k> ', and can not be detected directly. Instead, local changes during neuronal ac- 

^ ' tivity in the flow, volume, oxygen level and other characteristics of blood, 

called the blood oxygen level dependent (BOLD) response, form a proxy. 
Much research in fMRI has focused on identifying regions of cerebral activa- 
tion in response to the activity of interest. There is, however, growing interest 
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2 S. BH ATTACH ARYA AND R. MAITRA 

in obtaining better understanding of the interactions between different brain 
regions during the operation of the BOLD response. The study of how one 
neuronal system interacts with another is called effective connectivity anal- 
ysis [Friston (1994); Nyberg and Mcintosh (2001)]. We illustrate this in the 
context of obtaining greater insight into how an individual brain performs 
a Stroop task, which is also the main application studied in this paper. 

1.1. Investigating the attentional control network in a Stroop task. The 
human brain's information processing capability is limited, so it sifts out ir- 
relevant details from task-relevant information using the cognitive function 
called attention. Specifically, task-relevant information is filtered out either 
because of intrinsic properties of the stimulus (bottom-up selection) or in- 
dependently (top-down selection) [Frith (2001)]. The brain's preference for 
task-related information in top-down selection requires coordination of neu- 
ral activity via an Attentional Control Network (ACN) which has systems 
to process task-relevant and irrelevant information and also a "higher-order 
executive control system" to modulate the frequency of neuronal firings in 
each [Banach et al. (2000)]. Thus, the higher-order system can execute top- 
down selection by increasing neuronal activity in the task-relevant processing 
system while suppressing it in its task-irrelevant counterpart. Many studies 
have empirically found the dorsal lateral prefrontal cortex (DLPFC) to be 
the main source of attentional control, while the task-relevant and irrele- 
vant processing sites depend on whether the stimulus is visual, auditory or 
in some other form. 

Jaensch (1929) and Stroop (1935) discovered that the brain is quicker at 
reading named color words (e.g., blue, yellow, green, etc.) when they are 
in the concordant color than if they are in a discordant color. Tasks struc- 
tured along these lines are now called Stroop tasks. A much-studied two- 
phase experiment [Milham et al. (2002, 2003); Ho, Ombao and Shumway 
(2003, 2005); Milham, Banich and Barad (2003); Bhattacharya, Ho and 
Purkayastha (2006)] designed around such a task provided the dataset for 
our investigation. In the first phase, a subject was trained to associate each 
of three unfamiliar shapes with a unique color word ( "Blue," "Yellow" and 
"Green" ) with 100% accuracy. The second (testing) phase involved alternat- 
ing six times between blocks of eighteen interference and eighteen neutral 
trials. The neutral trial consisted of printing the shape in a neutral color 
(white). The interference trial involved presenting the subject with one of 
the learned shapes, but printed in a color different from that learned to be 
represented by that shape in the learning phase. The subject's task was to 
subvocally name the shape's color as trained in the learning phase, ignor- 
ing the color presented in the testing phase. Each neutral or interference 
trial consisted of a 0.3-s fixation cross, a 1.2-s stimulus presentation stage 
and a 0.5-s waiting state till the next trial. fMRI images were acquired and 
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processed to obtain three activated regions, whose averaged post-processed 
time series are what we analyze further to investigate attentional control. 
These three regions — also denoted as Regions 1, 2 and 3 in this paper — 
were the lingual gyrus (LG), the middle occipital gyrus (MOG) and the 
DLPFC, and were chosen as representatives of task- irrelevant, task-relevant 
and executive-control systems, respectively. The LG is a visual area for pro- 
cessing color information [Corbetta et al. (1991)], which in our context is 
task-irrelevant [Kelley et al. (1998)]. The MOG is another visual area but 
processes shape information, which is the task-related information (form of 
the shape) in the experiment. We refer to Bhattacharya, Ho and Purkayastha 
(2006) for further details on data collection and post-processing, noting here 
that, as in that and other preceding papers, the objective is to investi- 
gate and to understand the working of the ACN mechanism in performing 
a Stroop task. 

1.2. Background and related work. Structural equation modeling [Mcin- 
tosh and Gonzalez-Lima (1994); Kirk et al. (2005); Penny et al. (2004)] and 
time-varying parameter regression [Biichel and Friston (1998)] are two early 
approaches that have been used to determine effective connectivity. In gen- 
eral, both approaches ignore dynamic modeling of the observed system, even 
though the latter accounts for temporal correlation in the analysis. There 
is, however, strong empirical evidence [Aertsen and Preifil (1991); Friston 
(1994); Mcintosh and Gonzalez-Lima (1994); Biichel and Friston (1998); 
Mcintosh (2000)] that effective connectivity is dynamic in nature, which 
means that the time-invariant model assumed by both approaches may not 
be appropriate. Ho, Ombao and Shumway (2005) overcame some of these 
limitations by modeling the data using a state-space approach, but did not 
account for the time- varying nature of the effective connectivity parameters. 

An initial attempt at explicitly incorporating the time-varying nature of 
effective connectivity in addition to dynamic modeling of neuronal systems 
was by Bhattacharya, Ho and Purkayastha (2006), who adopted a Bayesian 
approach to inference and developed and illustrated their methodology with 
specific regard to the ACN mechanism of the LG, MOG and DLPFC regions 
in conducting the Stroop task outlined above. We summarize their model — 
framing it within the context of more recent literature in dynamic modeling 
of effective connectivity — and discuss their findings and some limitations 
next. In doing so, we also introduce the setup followed throughout this paper. 

1.2.1. Bayesian modeling of dynamic effective connectivity. Let yi{t) be 
the observed fMRI signal (or the measured BOLD response) corresponding 
to the ith region at time t, i = 1,2,. . . ,R, t = 1,2, . . . ,T. Specifically, yi{t) 
is some voxel-wise summary (e.g., regional average) of the corresponding 
detrended time series in the ith region. Following Bhattacharya, Ho and 
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Purkayastha (2006), let Xi{t) be the modeled BOLD response [as opposed 
to the measured BOLD response, yi{t)], that is, the stimulus s{t) convolved 
with the hemodynamic response function (HRF) /ij(i) for the ith region 
and time point t. In this paper, hi{t) is assumed to be the very widely-used 
standard HRF model of Glover (1999) which differences two gamma func- 
tions and has some very appealing properties vis-a-vis other HRFs [Lu et al. 
(2006, 2007)]. Then the model for the observed fMRI signal can be hierar- 
chically specified as 

(1) yi{t) = ai + Xiit)Pi{t) + £i{t), 

where ai and Pi{t) are the baseline trend and activation coefficients for the 
ith. region, the latter at time t. The errors ej(i)'s are all assumed to be 
independent N{0,af), following Worsley et al. (2002). From Bhattacharya, 
Ho and Purkayastha [(2006), page 797], we assume that Xi{-) = x{-) for 
i = l, . . . ,R, that is, we use the same HRF hi{-) = h{-) for each of the R re- 
gions. Note that, as argued in that paper, this homogeneous assumption 
on the x{-) is inconsequential because it is compensated by the /3i(i) that 
are associated with x{t), and allowed to be inhomogeneous with respect 
to the different regions. Also, following Bhattacharya, Ho and Purkayastha 
[(2006), page 799], we assume that af = o"^; i = 1,. . . ,R. Actually, (1) is 
a generalization of a very standard model used extensively in the literature — 
see, for example, Lindquist [(2008), equation (9)] or Henson and Friston 
[(2007), page 179, equation (14.1)], who use the same model but with a con- 
stant time-invariant (3{t) = /?. (Indeed, as very helpfully pointed out by 
a reviewer, this last specification is also the general linear model com- 
monly used to analyze fMRI data voxel- wise, such as in statistical para- 
metric mapping and related conventional whole brain activation studies.) 
Our specific generalization incorporates time-varying /3(t) and follows Ho, 
Ombao and Shumway (2005), Bhattacharya, Ho and Purkayastha (2006) or 
Harrison, Stephan and Friston [(2007), cf. page 516, equation 38.18] — note, 
however, that the latter model /3(t) as a random walk [see equation 38.19, 
page 516, of Harrison, Stephan and Friston (2007)]. We prefer allowing for 
time- varying activation /3j(t) in order to address the "learning" effect of- 
ten reported in fMRI studies whereby strong activation in the initial stages 
of the experiment dissipates over time [Gossl, Auer and Fahrmeir (2001); 
Milhametal. (2002, 2003); Milham, Banich and Barad (2003)]. Further 
modeling specifies the activation coefficient in the ith region at the tth 
time-point in terms of the noise-free BOLD signal in the other regions at 
the previous time-point. Thus, 



(2) ft(i)=x(t-l) 



R 



Y,^ii{t)l3i{t-1) 
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where uJi{t) are independent A^(0,o"^)-distributed errors and 7ij(t) is the 
influence of the jth region on the ith region at time t. Under (2), functionally 
specified cerebral areas are not constrained to act independently but can 
interact with other regions. Our objective is to make inferences on ^ij{t) 
in order to understand the functional circuitry in the brain as it processes 
a certain (in this paper, Stroop) task. 

Equations (1) and (2) together specify one of many Vector Autoregres- 
sive (VAR) models proposed by several authors [Harrison, Penny and Fris- 
ton (2003); Goebel et al. (2003); Rykhlevskaia, Fabiani and Gratton (2006); 
Sato et al. (2007); Thompson and Siegle (2009); Patriota, Sato and Achic 
(2010)]. To see this, note that for i = 1,. . . ,R, j3i{t — 1) depends linearly 
upon yi{t — 1). Hence, substituting this in (2) yields /3i(t) = gi{yi{t — 1), 
y2{t — l), . . . ,yji{t — l)), for known functions gi , which are linear in yi (t — 1 ) , 
y2(t — 1), . . . ,yR{t — 1). Then substituting /3j(i) in (1), we see that for each 
i = 1,. . . ,R, yi{t) is a linear function of yi{t — l),y2{t — 1), . . . ,yR{t — 1). 
Hence, the vector y(i) = (yi(t), . . . ,y/j(t))' is a linear function of the vec- 
tor y{t — 1) = {yi{t — 1), . . . ,y/j(t — 1))'. As a result, our model is a first 
order VAR model from the viewpoint of the responses. It is of first or- 
der since y(t) depends upon y(i — 1), given y(l),...,y(i — 1). Moreover, 
(2) shows that the activation coefficients f3i{t) are modeled as first order 
VAR; that is, the i?-component vector (/3i(t), . . . ,/3/{(t))' depends linearly 
upon(/3i(t-l), ...,/3fl(i-l))'. 

VAR models provide an alternative or a substantial generalization [Fris- 
ton (2009)] to the Dynamic Causal Modeling (DCM) approach proposed by 
Friston, Harrison and Penny (2003), at least in continuous-time, to model 
the change of the neuronal state vector over time, using stochastic differ- 
ential equations. In DCM, the observed BOLD signal is modeled as yi{t) = 
fi{t) + Pzi{t) + ei(t), where Zi{t) denotes nuisance effects, and ri{t) is a mod- 
eled BOLD response obtained by first using a bilinear differential (neural 
state) equation, parametrized in terms of effective connectivity parameters 
and involving s{t), then subsequently using a "balloon model" transforma- 
tion [Buxton, Wong and Frank (1998) or extensions Friston et al. (2000); 
Stephan et al. (2007)] to the solution of the bilinear differential equation. 
DCM thus uses both rj(t) as well as the nuisance effects Zi{t) to model the 
observed BOLD response, with rj(t) playing the same role as our Xi{t) with 
the exception that the latter is obtained using the more widely-used Glover 
(1999) HRF model. Further, DCM assumes a deterministic relationship be- 
tween the different brain regions unlike (2) which allows for noisy dynamics 
[Bhattacharya, Ho and Purkayastha (2006)]. 

Thompson and Siegle (2009) contend that VAR models have gained pop- 
ularity in recent years because "the direction and valence of effective connec- 
tivity relationships do not need to be pre-specified." As such, these models 
have provided a useful framework for effective connectivity analysis. 
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Bhattacharya, Ho and Purkayastha (2006) proposed a symmetric random 
walk model for ")ij{t): 

(3) 7y(i)=7*j(i-l)+5*j(i) fori,i = l,2,...,i?;t = 2,3,...,r, 

where 5ij are independent A^(0,(j|)-distributed errors. In this paper we use 
A4rw to refer to the model specified by (1), (2) and (3). The effective con- 
nectivity parameters 7ij(t); (i, j) = 1, . . . ,i?, also form a VAR model of the 
first order. To see this, let T{t) = {'~fij{t);i,j = l,...,i2)'. Then it follows 
that r(t) = ir(t — 1) + S{t), where I is the R x i?-order identity matrix and 
S{t) = {6ij{t);i,j = 1, . . . ,-R)', indicating that 7ij(t)'s are within the frame- 
work of a VAR model. 

Bhattacharya, Ho and Purkayastha (2006) specified prior distributions on 
the parameters and hyperparameters of this model and used Gibbs sampling 
to learn the posterior distributions of the unknowns. We refer to that paper 
for details and for results on simulation experiments using AIrw, noting 
here only that their Bayesian-derived inference supported ACN theory and, 
more importantly, the notion that effective connectivity is indeed dynamic 
in the network. Further, they found that the restricted model with 731 (i) = 
732 (i) = Vi was the best-performer, implying no direct feedback from the 
two sites of control (LG and MOG) to the source (DLPFC). Interestingly, 
however, and perhaps surprisingly, their estimated 7jj(t)'s (see Figure 6 
in their paper) had very little relationship with the nature of the BOLD 
response (see Figure 1, bottom panel, in that paper). This is surprising 
because from (1), we have /3j(t) = {yi(t) — «« — ei{t))/x{t), and similarly for 
/3j(f — 1), which when substituted on the right-hand side of (2) makes it 
independent of x{-). This means that the effective connectivity parameters 
7jf(t) depend upon Pi{t), the left-hand side of (2). Since /3j(t) is a function 
of x{t), it is reasonable to expect 7i£(t)'s to depend upon x{t), but such 
a relationship was not found in Bhattacharya, Ho and Purkayastha (2006). 
This perplexing finding led us to first investigate robustness of A4rw to even 
slight misspecifications. 

1.2.2. Robustness of the random walk model. We tested the effect of 
a slight departure from A^rw by simulating, instead of from (3), from the 
following stationary autoregressive model: 

(4) jij{t) = 0.999jij{t - 1) + dij{t) for i, j = 1,2,. . . ,i?;t = 2,3, . . . ,r. 

We call this slightly modified model AIrw'- Here, T = 285 and i? = 3 to 
match the details of the dataset of Section 1.1. We fit TWrw to data sim- 
ulated from tVIrw'- Figure 1 displays the estimated posterior distributions 
of 7ij(t). The marginal posterior distribution of each ■jij{t) is represented 
here by eight quantiles, each containing 12.5% of the distribution: increased 
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Fig. 1. Posterior densities of 'yij{t);t — 1, ... ,T;i,j — 1,2,3, under model Mrw on data 
simulated under model AIrw' ■ The opacity of shading in each region is proportional to the 
area under the density m that region. The solid line stands for the true values of ^ij (t) . 



opacity in shading denotes denser regions. Solid lines represent true values. 
As seen, many parts of the posterior distribution have very little coverage 
of the true effective connectivity parameters: this finding is also supported 
by Table 1 which provides the proportion of true values included in the 95% 
highest posterior density (HPD) credible intervals [Berger (1985)] (these are 
the shortest intervals with posterior probability 0.95). Thus, performance de- 
grades substantially even though A^rw is not all that different from AIrw- 
Hence, modeling the process by a random walk may be too restrictive and 
thus a better approach may be needed. We do so in this paper by embed- 
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Table 1 

Proportion of true ^ij (t) included m the 95% posterior credible 

intervals obtained using model Mbsm on data simulated using TMrw' 
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ding an (asymptotically) stationary first order autoregressive AR(1) model 
in a larger class of models. Formally, we employ a Bayesian nonparametric 
framework using a Dirichlet Process (DP) prior whose base distribution is 
assumed to be that implied by an AR(1) model. The intuition behind this 
modeling style is that although one might expect the actual process to be 
stationary, the assumption might be too simplistic, and it is more logical 
to think of the stationary model as an "expected model," thus allowing for 
nonstationarity (quantified by the DP prior) in the actual model. Theoret- 
ical issues related to the construction of DP-based nonstationary processes 
are discussed in Section 2.1. In Section 2.2 we introduce our new model- 
ing ideas using the developments in Section 2.1. The efficacy of the new 
model is compared with its competitors on some simulated datasets in Sec- 
tion 3. The new approach is applied in Section 4 to the dataset introduced in 
Section 1.1 to investigate effective connectivity between the LG, MOR and 
DLPFC regions. We conclude in Section 5 with some discussion. Additional 
derivations and further details on experiments and data analyses are pro- 
vided in the supplement [Bhattacharya and Maitra (2011)], whose sections, 
figures and tables have the prefix "S-" when referred to in this paper. 

2. Modeling and methodology. 

2.1. A nonstationary Dirichlet process model for time series observations. 
A random probability measure G on the probability space {T,B^) sampled 
from the Dirichlet Process (DP) denoted by DP(rGo), and with known 
distribution Gq and precision parameter r, can be represented almost surely, 
using the constructive method provided in Sethuraman (1994), as 



(5) G = Y,pk5. 



%■' 



fc=i 



where pi = hi and pk = bk Yli=i (^ — be),k = 2,3,. .. , with ft^'s being inde- 
pendent, identically distributed (henceforth i.i.d.) (3(1, r) random variables. 
The values 7^ are i.i.d. realizations from Gq, for A; = 1,2, ... , and are also 
independent of {61,62,...}. Note that (5) implies that G is discrete with 
probability one, and has expectation Gq. DPs thus provide ways to place 
priors on probability measures. 
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The dependent Dirichlet process (DDP) is an extension of the DP in the 
sense that it allows for a prior distribution to be specified on a set of random 
probability measures, rather than on a single random probability measure. 
In other words, the realizations 7^ can be extended to accommodate an 
entire time-series domain 7", such that F^ ^ = {7^^;^ S T}- Following (5), 
the random process thus constructed can be represented as 

00 
(6) G(^)^^pfc<5r.^ 

fe=i 

with form similar to that used for spatial DP models [see Gelfand, Kottas 
and MacEachern (2005)]. Note that Tlj- in (6) are realizations of some 

stochastic process T-j- = {7*; t € T}, with distribution Gq for k = 1,2, 

Hence, Kolmogorov's consistency holds for T-j-. That is, finite dimensional 
joint distributions {7t;i€tT}, for ordered time-points tT = {h, . . . ^tr}, 
can be obtained from all finite but higher-dimensional joint distributions 
{'yt',t G t^Utr} (here t^ is a finite set) specified by the process, by marginal- 
izing over {jt'jt G ty}. Since (6) shows that G^*^^ is specified completely by 
the process Tj- and {pk] fc = 1, 2, . . .}, and since the latter are independent 
of t, it follows that Kolmogorov's consistency holds for G' ', providing 
a formal setup of a stochastic process of random distributions. In partic- 
ular, for any t € 7", G^^'-'' ~ DP(rGQ ) [and admits the representation 
Gii*}) = Yl'k'=i Pk^-y* ]• The collection of random measures G^' ^ is said to 
follow the DDP [see,*e.g., MacEachern (2000); De lorio et al. (2004); Gelfand, 
Kottas and MacEachern (2005)]. 

The process Tj- may be a time series that is stationary or — as adopted 
in our application and more realistically — asymptotically so. Indeed, while 
asymptotic stationarity is a very slight departure from stationarity. Sec- 
tion 1.2.2 demonstrates that it can have quite a significant impact on in- 
ference. It is also important to observe that although the process may be 

(T) 
stationary or asymptotically stationary under Gq , the same process when 

conditioned on G"^ is not even asymptotically stationary. Specifically, 

00 00 / 00 \ '^ 

S(7t I G(^)) = ^Pk7l„ Var(7, | G^^)) = ^^,(7,;)' - Y^P^llt 
fe=i fc=i \fe=i / 

and 

00 / 00 \ / 00 \ 

Cov(73,7t I GC^)) = ^Pfc7L7fct - J^P^^k^ ^Pkl*kt ■ 

fe=l \fc=l / \A:=1 / 

Thus, G"^ is nonstationary, although under Gq , Tf may have a station- 
ary model so that the mean is constant and the covariance depends upon 
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time only through the time lag |t — s|. Thus, we have defined here a process 
G^ ' that is centered around a stationary process, but is actually nonsta- 
tionary. For purposes of applications, we have given (ordered) time-points 
(fi, . . . ,ir)i a T-variate distribution G^-^^ on the space of all T-variate dis- 
tributions of (71, . . . ,7t)' with mean Gq being the T-variate distribution 
implied by a standard time series. 

The development of our nonstationary temporal process here technically 
resembles that of a similar spatial process in Gelfand, Kottas and MacEach- 
ern (2005), but differs from the latter in that it is actually embedded in the 
model for the observed fMRI signals. As a result, the full conditional dis- 
tributions of jij (t) 's in our model are much more general and complicated 
than similar derivations following Gelfand, Kottas and MacEachern (2005). 
Another important difference between our approach and that of Gelfand, 
Kottas and MacEachern (2005) is that the latter had to introduce a pure 
error ("nugget") process to avoid discreteness of the distribution of their 
spatial data. Such discreteness of the distribution (of our temporal data) 
is naturally avoided here, however, owing to the embedding approach used 
in our modeling. Gelfand, Kottas and MacEachern (2005) also rely on the 
availability of replications of the spatial dataset: our embedding approach 
obviates this requirement by merely assuming the availability of replicated 
(unobserved) random processes. We now introduce our dynamic effective 
connectivity model. 

2.2. A Dirichlet process-based dynamic effective connectivity model. 

2.2.1. Hierarchical modeling. For i,j = 1,2,. . . ,R, define the T-compo- 
nent vectors Fjj = (7jj(1),7jj(2), . . . ,7jj(r))'. Further, let Fjj's be i.i.d. G, 
where G ~ DP(tGo), with r denoting the scale parameter quantifying un- 
certainty in the base prior distribution Gq- Also, assume that under Gq, 
7y(l) ~ iV(7,o-^) and for t = 2,...,T, ^ij{t) = pjij{t - 1) + Sij{t), where 
IpI < 1 and 6ij{t) ~iV(0,o-|) are i.i.d. for i, j = 1,2, ... ,R;t = 1,2, ... ,T. It 
follows that under Gq, Fjj r^ NxijfJ'Tj'^) where ^irp = (l,p,p^ ,. . . ,p^~^)' 
and for s <t, 5] has the (s,t)th element 

/I _ /,2(s-l) 

/"7^ >p ^s+t-2^2 I j-s2l ^ P 

(7) T.st = P^ CTy + P cr^ ( — ^_ 2 

Note that with Gq as described above, the process is stationary if we choose 
7 = and o"^ = o"|/(l — p'^), otherwise the process converges to station- 
arity for large s. In other words, under Gq, £'(7jj(s)) = £'(/>'*~^7jj(1) + 
'l2r=o P^ ^ij (^ — r)) = p^~^^ which converges to as s — ?> 00, while from (7) 
it follows that, as s ^- 00 with t — s < 00, T,st -^ p*~*(t|/(1 — p'^). The case 
for s > t is similar. Using the above developments, we specify our dynamic 
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effective connectivity model hierarchically, by augmenting (1) and (2) with 
the following model for 7jj(t)'s: 

Tij '-^ • G(^) for i,j = 1,2,..., R, where G^^) ~ DP(rGf ^). 

Distributional assumptions on ej(i)'s, a;i(t)'s and 5jj(t)'s are as in Sec- 
tion 1.2.1. We use A^dp to refer to this model: note also that as r — )• oo, 
our DP-based model converges to the AR(1) model, which we denote us- 
ing TWar- We note in closing that the effective connectivity parameters are 
AR(1), hence VAR, under the expected distribution of A^dp- Of course, they 
are trivially also so under A^ar- Note, however, that given a realization of 
a random distribution from the Dirichlet process, such VAR representation 
does not hold. 

2.2.2. Other prior distributions. We specify independent prior distribu- 
tions on each of a'^,a'^,a'^,p,T, ai,/3i(l) and 7jj(1);z,j = l,2,...,i?. Specif- 
ically, Oj's are assumed to be i.i.d. N{fj.i,a'^) for i = 1,2,. . . ,R and /3j's 
are assumed to be i.i.d. N{P,a'a), for i = 1,2,. ..,R. Also, 7ij(l)'s are in- 
dependently distributed with mean 7 and variance o"^, while p is uniformly 

distributed in (—1,1), r~r(aT-,feT) and o"^^, o"~^ and aj are each i.i.d. 
T{a,b) with density having the functional form. Here fii,a'^,f3 and a'i, 7 

and (7^, a, b, a-j- and b^- are all hyperparameters. In our examples, we take 
a = b = 0, reflecting our ignorance of the unknown parameter a'j. Although 
the Gamma priors with a = b = are improper, they yielded proper posteri- 
ors in our case, vindicated by fast convergence of the corresponding marginal 
chains and resulting right-skewed posterior density estimates, which are ex- 
pected of proper posteriors having positive support. For (arjbr) we first 
fix the expected value of T{ar,bT-) (given by a^/br) to be such that in the 
full conditional distribution of Fjj, given by (8), the "expected" probabil- 
ity of simulating a new realization from the "prior" base measure approx- 
imately equals the probability of selecting realizations of Fj/j/, for some 
(^'liO 7^ (^) j)- Hence, if there are R? nonzero Fjj in the model, then setting 
aj- = c{R? — 1) and b^ = c serves the purpose. The resulting prior distribu- 
tion has variance equal to its expectation if c = 1. To achieve large variance, 
we set c = 0.1; the associated prior worked well in our examples. We also 
experimented with c = 0.01 and c = 0.001 and noted that while the case with 
c = 0.1 provided the best results (see Tables S-1 and S-2), inferences related 
to the posterior distributions of the observed data were fairly robust with 
respect to different choices of c. Moreover, the results demonstrate that in 
terms of percentage of inclusion of the true 7ij's, all inclusion percentages, 
with the exception of 732 and 733, were quite robust with respect to c. The re- 
sults corresponding to c = 0.01 and c = 0.001 were quite similar, while those 
corresponding to c = 0.1 yielded better performance. Further, other hyper- 



12 S. BHATTACHARYA AND R. MAITRA 

parameters were estimated empirically from the data as in Bhattacharya, 
Ho and Purkayastha (2006) using Berger's (1985) ML-II approach. 

2.2.3. Full conditional distributions. The posterior distribution of the 
parameters are specified by their full conditionals, which are needed for 
Gibbs sampling. The full conditional distributions of Oi, fii{t), a^ and cr^ are 
of standard form (see Section S-1.1), while those of the T^s require some 
careful derivation. To describe these, note that, on integrating out G^'^\ 
the prior conditional distribution of Tij given Tki for {k,tj ^ («,i) follows 
a Polya urn scheme, and is given by 

(8) [^.,|^.,;(fe,£)/(^,,)]-■ "^""^ + ^(M)^(m) ^^« 



T + #{(A:,^):(A:,£)/(i,j)}- 

The above Polya urn scheme shows that marginalization with respect to G 
induces dependence among Fjj in the form of clusterings, while maintain- 

(T) 

ing the same stationary marginal Gq for each Tij . For Gibbs sampling we 
need to combine (8) with the rest of the model to obtain the full condi- 
tional distribution given all the other parameters and the data. We obtain 
the full conditionals by first defining, for i,j = l,2,...,R, diagonal matri- 
ces Ai, = aj2diag{0,x2(l)/32(l),a;2(l)/32(2), . . . ,x\T -1)P]{T -1)}, where 
diag lists the diagonal elements of the relevant matrix. We also define T- 
variate vectors Bjj for i,j = 1,2, ... ,R with first element equal to zero. For 
t = 2, . . . ,r the tth element of Bjj is Bij{t) = (7-^[l3i(t)f3j{t - l)x(t - 1) - 
[3j{t — l)x^(t — 1) J2e=i ■ ij^j lii{t)Pf.{'t — !)]• Further, we note that, thanks to 
conditional independence, it is only necessary to combine (8) with (2) to 
obtain the required full conditionals. It follows that 



(9) 


[r.. I •••]-. r^G(p+ Y^ ,(-)^r«, 


{k/)i^{i,3) 


where G^- • is the T-variate normal distribution with mean (S~^ + 


Ajj)"^(75]"^ X /x + Bjj) and variance (S"^ + Ajj)"^ Also, 


3i) _p ^ 


(10) X exp 


-^ItV^e-Vt 



- (75]-Vt + B,j)'(5]-i + AijO^^TS- Vt + B^j)} 

and 

(11) g(^^) = Cexp[-l(rH - Ar.iB,,)'A,,(rfc, - A-.^B,,) - B^A-.^B,,] 
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for {k,i) / (i,i), with C chosen to satisfy i?o +S(fc i)^(i j) l^''^^ ~ ^- Observe 

that unlike all DP-based approaches hitherto considered in the statistics 

(T) 

literature, in our case G- , the conditional posterior base measure is not 
independent of Fj/j/ for {i',j') i^ (hj), which is a consequence of the fact 
that, thanks to (2), Fj/j' are not conditionally independent of each other. 
Thus, our methodology generalizes other DP-based methods, including that 
of Gelfand, Kottas and MacEachern (2005). 

Section S-1.2 presents an alternative algorithm to updating Fjj using con- 
figuration indicators which are updated sequentially using themselves and 
only the distinct Fj-,-, given everything else. MacEachern (1994) has argued 
that such an updating procedure theoretically improves convergence prop- 
erties of the Markov chain: however, Section S-1.3 shows that in our case the 
associated conditional distributions need to be obtained separately for each 
of the 2^ possible configuration indicators. This being infeasible, we recom- 
mend (9) for updating Fjj. [We remark here that full conditionals are easily 
obtained using configuration indicators in the case of Gelfand, Kottas and 
MacEachern (2005), thanks to the relative simplicity of their spatial prob- 
lem.] Also, (10) and (11) imply that as r — ?> oo, the full conditional distribu- 

tion (9) converges to G^ ■ , which is actually the full conditional distribution 
of the entire T-dimensional parameter vector Fjj under the AR(1) model. 
In either case, we provide computationally efficient multivariate updates for 
our Gibbs updates: this makes our problem computationally tractable. 

To obtain the full conditional of r, define m = #{{i,j);i,j = 1,2, . . . ,i?} = 
R^. Then, note that as in Escobar and West (1995), for a T{ar,br) prior on r, 
the full conditional distribution of the latter, given the number (d) of dis- 
tinct Tij and another continuous random variable rj, is a mixture of two 
Gamma distributions, specifically 7r^r(aT- + d,br — log(?7)) + (1 — 7r^)r(ar + 
d — Ijb-r — log(r7)), where 7r^/(l — vr^) = (a,- + d— l)/{m{br — log(?7))). Also, 
the full conditional of rj is (3(r -|- l,m). Finally, the full conditional dis- 
tributions of (t| and p are not very standard and need careful derivation. 
Section S-1.4 describes a Gibbs sampling approach using configuration sets 
for updating as and p. For implementing this Gibbs step, one does not 
need to simulate the configuration indicators, as they can be determined 
after simulating the Fj^'s using (9). Hence, this step is feasible. However, we 
failed to achieve sufficiently good convergence with this approach, and hence 
used a Metropolis-Hastings step. The acceptance ratio for the Metropolis- 
Hastings step is given by [Fn][Fi2 | Fii][Fi3 | Fi2,Fii] • • • [F33 | F32, . . . ,Fii], 
evaluated, respectively, at the new and the old values of the parameters 
((T^,p). In the above, [Fn] ~ Gq , and the other factors are Polya urn dis- 
tributions, following easily from (8). Once again, note the use of multivariate 
updates in the MCMC steps, making our updating approach computation- 
ally feasible and easily implemented. 
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We conclude this section by noting that our model is structured to be 
identifiable. The priors of Oj, /3j(t), 7ij(i) are all different and informative. 
Further, (2) shows that /3i{t) is not permutation-invariant with respect to 
the indices of Fjj's. Identifiability of our model is further supported by the 
results in this paper, which show all posteriors (based on MCMC) to be 
distinct and different. This is unlike the case of the usual Dirichlet process- 
based mixture models which are permutation-invariant, as in Escobar and 
West (1995), where the parameters have the same posterior due to noniden- 
tifiability. We now investigate performance of our methodology. 

3. Simulation studies. We performed a range of simulation experiments 
to investigate performance of our approach relative to its alternatives. Since 
there are 9 nonzero Fj^'s in our model, we followed the recipe provided in 
Section 2 and put a r(0.8, 0.1) prior on the DP scale parameter r. We inves- 
tigated fitting A^DP, -^AR and A4rw to the simulated data of Section 1.2.2 
and also to data simulated from the TWrw and TWar models, the latter with 
both p = 0.5 (clearly stationary model) and p = 0.95 (where the model is not 
so clearly distinguished from nonstationarity but more clearly distinguished 
than when p = 0.999). The Gibbs sampling procedure for model 7V4ar in our 
simulations was very similar to that of the AIrw detailed in Bhattacharya, 
Ho and Purkayastha (2006): we omit details. For all experiments in this pa- 
per and in the supplement, we discarded the first 10,000 MCMC iterations 
as burn-in and stored the following 20,000 iterations for Bayesian inference. 
Our results are summarized here for want of space, but presented in detail in 
Section S-2, with performance evaluated graphically [in terms of the poste- 
rior densities of 7ij(t)'s] and numerically using coverage and average lengths 
of the 95% HPD credible intervals of the posterior predictive distributions 
(for details, see Section S-2). 

The results of our experiments using the simulated data of Section 1.2.2 
showed that AIar performed better than A^rwi but model A^dp was the 
clear winner. Indeed, the support of the posterior distributions of 722 (i) and 
723 (i) using A^AR were much too wide to be of much use, but substan- 
tially narrower under AIdp- -Mbp also outperformed the other two models 
in terms of the proportion of true 7jj(t)'s included in the corresponding 95% 
HPD CIs. These CIs also captured almost all of the true values of 7ij(i) 
under AIdP) but far fewer values using AIar- Mbp also exhibited better 
predictive performance than AIar and MwJV- All these findings which fa- 
vor our DP-based model were implicitly the consequence of the fact that the 
true model in our experiment was approximately nonstationary, and mod- 
eled more flexibly by our nonstationary DP model rather than the stationary 
AR(1) model. That this borderline between stationarity and nonstationar- 
ity of the true model is important was vindicated by the results of fitting 
A4rW) AIar and A^dp on the dataset simulated using A^rw- Here, AIrw 
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outperformed both Mbp and AIar in terms of coverage of the true values 
of 7ij(i), indicating that A^dp may under-perform when compared to the 
true model, in terms of coverage of parameter values, when the true model 
can be clearly identified. In terms of prediction ability, however, AIdp was 
still the best performer, with the best coverage of the data points by the 
posterior predictive distribution and the lengths of the associated 95% CIs. 
This finding was not unexpected, since A^dp involves model averaging (see 
Section S-1.5), which improves predictive performance [see, e.g., Kass and 
Raftery (1995)]. For the dataset simulated from A^ar with p = 0.5, the 
true model (A^ar) outperformed AIdp marginally and A^rw substantially, 
but when p = 0.95, A^dp provided a much better fit than A^ar or A^rw- 
We have already mentioned that A4dp outperformed AIar (and AIrw) for 
the borderline case oi p = 0.999: the experiment with p = 0.95 demonstrated 
good performance of AIdp even in relatively more distinguishable situations. 
At the same time, the experiment with p = 0.5 warns against over-optimism 
regarding AIdp; for clearly stationary data, we are at least marginally better 
off replacing A4dp with a stationary model such as A4ar- In spite of this 
caveat for clearly stationary situations, our simulation experiments indicated 
that our DP-based approach is flexible enough to address stationary models 
as well as deviations. We now analyze the Stroop Task dataset introduced 
in Section 1.1. 

4. Application to Stroop task data. The dataset was preprocessed fol- 
lowing Ho, Ombao and Shumway (2005) and Bhattacharya, Ho and Pur- 
kayastha 2006, to which we refer for details while providing only a brief 
summary here. For each of the three regions (LG, MOG and DLPFC), 
a spherical region of 33 voxels was drawn around the location of peak acti- 
vation. The voxel-wise time series of the selected voxels in each region were 
then subjected to higher order (multi-linear) singular value decomposition 
(HOSVD) using methods in Lathauwer, Moor and Vandewalle (2000). The 
first mode of this HOSVD, after detrending with a running- line smoother as 
in Marchini and Ripley (2000), provided us with our detrended time series 
response yi{t) for the ith region [see Figure S-4 for y(t)'s as well as x{t)]. 

We compared results obtained using AIdp with those using AIrw 
and A^AR- We refer to Bhattacharya, Ho and Purkayastha (2006) and the 
supplement for detailed results using A4rw and A^aRi respectively, only 
summarizing them here in comparison with results obtained using A^dPj 
which we also discuss in greater detail here. Detailed studies on MCMC 
convergence are in Section S-3.2. 

4.1. Results. Figure 2 displays the Gibbs-estimated marginal posterior 
distributions of the 7ij(t)'s for each time point t obtained using A^dp- 
A striking feature of the marginal posterior densities of Figure 2 is the very 
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Fig. 2. Estimated posterior densities (means in solid lines) of the regional influences 
over time. 



strong oscillatory nature of these effective connectivity parameters with the 
modeled BOLD response x{t). This is quite different from the posterior 
distributions of 7jj(i)'s obtained using A^ar (see Figure S-7). Table 2 eval- 
uates performance of the two models in terms of the length and proportion 
of observations contained in the 95% HPD credible intervals of the poste- 
rior predictive distributions: the intervals obtained using A4dp have greater 
coverage but are also much narrower, making it by far the better choice 
among the models. Figure 2 also shows that 723(i),732(*) and 733 (t) — and, 
to a lesser extent, 721 (i) and 731 (t) — oscillate differently from the others in 
that their amplitude is close to zero. We examined this issue further through 
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Table 2 

Proportions of observed y included in, and average length, of the 

95% credible intervals of the posterior predictive distributions 

under A^ar and AIdp for the Stroop task dataset 





Proportions 


Average length 


y 


Mar 


AIdp 


AIar 


A4dp 


2/1 

2/2 
2/3 


0.92 
1.00 
1.00 


0.99 
1.00 
1.00 


4,960.9 
3,864.2 
4,352.8 


2,215.1 
2,068.1 
2,084.3 



Figure 3, which provides a map of the proportions of the cases for which 
each estimated marginal posterior density of 7ij(i) has positive support at 
time t. The intensities are mapped via a red-bhie diverging palette: thus, 
darker hues of blue and red indicate high and low values, respectively, for 
the proportions. Lighter hues of red or blue indicate values in the middle. 
Clearly, very little proportion of the marginal density is either on the positive 
or the negative parts of the real line for the cases of 723 (i), 732 (i) and 733 (t). 
We therefore investigated performance of models A^dp modified to exclude 
some or all of these regional influences. 
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Fig. 3. Proportions of estimated marginal posterior density of^ij{t) with positive support 
at t. 
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4.1.1. Investigating restricted submodels o/A^dp- Bhattacharya, Ho and 
Purkayastha (2006) found that the model A^rw with the constraint 731 (t) = 
732(0 = (henceforth M^t^) provided better results that the unconstrained 
A^RW Figure 2 also points to the possibility that models with some 7ij(t) = 
might provide better performance. We explored these aspects quantitatively 
using the models A^ar and Mbp, by considering the proportion of data 
contained in, and the average lengths of, the 95% HPD CIs of the corre- 
sponding posterior predictive distributions of yi{t);i = l,2,3,i = 1, . . . ,T. 
A systematic evaluation of all possible submodels is computationally very 
time-consuming, so we investigated models with combinations of 731 (i) = 
732 (i) = as in Bhattacharya, Ho and Purkayastha (2006) and with null 
jij(t)'s for those (j,j)'s whose posterior distributions exhibited less ampli- 
tude of oscillation as per Figure 2. Table 3 summarizes performances of the 
top three submodels: others are in Tables S-11 and S-12. The top three 
performers were the following: 

• M^^l: Mdp but with 733 (t) = Vt. 

• M^j^l: Mdp but with 732(t) = Vt. 

• m'^I: Mdp but with 732(i) =733(i) =0 Vt. 

Thus, M'i^l and m'^1 both beat TWdp (of Table 2). The average 95% pos- 
terior predictive length using Mj^p is about midway between Al^p and the 
unrestricted DP-based model, so we report our final findings and conclusions 
only using A^pp. 

4.2. Summary of findings. Figure 4(a)-(h) display the posterior densi- 
ties of the nonnull regional infiuences 7ij(t)'s over time. These 7jj(t)'s are 
very similar to those in Figure 2(a)-(h), with nonzero effective connectiv- 
ity parameters again having a very pronounced oscillation synchronous with 
the modeled BOLD response: indeed, only the 723 (t) of Figure 4(f) has an 

Table 3 

Proportions of the observed data in, and mean lengths of, the 95% credible intervals of 

posterior predictive distributions of yi, 1/2, 1/3 and the mean lengths of the 95% credible 

intervals for the top three candidate submodels 







Proportion 






Mean length 
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0.99 

1.0 

1.0 


0.99 

1.0 

1.0 


1.0 
1.0 
1.0 


2,097.6 
1,971.6 
1,985.0 


2,140.4 
2,019.5 
2,021.3 


2,276.5 
2,127.8 
2,125.4 
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Fig. 4. (a)-(h) Estimated posterior densities (means in solid lines) of the nonnull regio- 
nal influences over time using A1J-,p. (i) Proportion of the posterior distribution of "/ij{t) 
with positive support at time t . 



oscillation slightly more damped than in Figure 2. Fm'ther, Figure 4(i) in- 
dicates that the estimated posterior densities put most of their mass either 
below zero [when x(t) is negative] or above zero [when x{t) is positive]. In- 
deed, these densities have substantial mass around zero only when x{t) is 
around zero. We also smoothed the modeled BOLD response x{t) to explore 
further its relationship with each of the estimated posterior mean 7jj(t)'s 

from Mj^p. For each t, we specified x{t) = Acos{27rujt + (f)) + ipt where ipt 
are i.i.d. N{0,a'i), A is the amplitude of the time series, to is the oscilla- 
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tion frequency and is a phase shift. Equivalently, x{t) = /3i cos(27ra;t) + 
/32 sin(27ra;i) + ^4 with /3i = Acos((^) and /32 = Asin{(j)). We obtain a) = 0.02 
using the periodogram approach [see, i.e., Shumway and Stoffer (2006)]. 
Thus, each cycle in x{t) has a length of about 50 time-points. A least squares 
fit yields /3i = 0.27 and (32 = -0.61, hence, A = 0.80 and (f) = 1.16. Figure S-8 
shows that the smoothed BOLD response x{t) = j3i cos(27ra)i) + 1^2 sin(27ra)i) 
closely approximates the original time series x{t). The correlation of x{t) 
with each of 711 (t), 712 (t), 7i3(t), 721 (t), 722(i), 723 (t), 73i(*) and 732 (t) are 
0.959, 0.909, 0.952, 0.950, 0.922, 0.874, 0.949 and 0.929, respectively. Thus, 
7jj(t)'s are not completely linear in the BOLD response, but very close to 
being so with regard to its transformed version. 

The results of our analysis indicate that the region LG, centered around 
zero, exhibits very strong evidence of self-feedback, oscillatory with high am- 
plitude, and period of about 50, matching the period of the modeled BOLD 
response x{t). Similar influences are exerted by both MOG and DLPFC on 
LG and by the MOG region on itself. Indeed, Figure 4 indicates that these 
four inter- and intra-regional influences have, broadly, a similar pattern in 
terms of amplitude. The influence of LG on MOG and DLPFC is smaller 
and similar to each other. Further, Figure 4(f) and (h) indicate that the 
feedback provided by DLPFC on MOG [723(^)1 is similar to that in the re- 
verse direction [732(0]- Thus, there are three broad patterns in the way that 
inter-and intra-regional influences occur. 

Our analysis also demonstrates the existence of the ACN and its mech- 
anism while performing a Stroop task. Thus, the executive control system 
(DLPFC) provides instruction to both the task-irrelevant (LG) and task- 
relevant processing sites (MOG) but gets similar levels of feedback from the 
task-relevant processor (MOG). LG which sifts out the task-irrelevant color 
information gets a lot of feedback in doing so from both itself and MOG. 
However, it provides far less feedback to the task-relevant shape information 
processing MOG and the executive control DLPFC. MOG itself provides 
substantial self-feedback while processing shape information. Finally, note 
that while our results indicate higher amplitudes for inter-regional feedback 
involving 7jj(i)'s when they involve LG rather than MOG, this is consistent 
with the established notion that processing shape information is a higher- 
level (more difficult) cognitive function than distinguishing color. 

The results on the effective connectivity parameters using TWdp are very 
different from those done using M.^^ [see Figure 5 of Bhattacharya, Ho 
and Purkayastha (2006)] or Mar- Using A^R-yV' Bhattacharya, Ho and 
Purkayastha (2006) found some evidence of self-feedback only in LG: the 
95% HPD BCRs contained zero unless t increased. Further, while the re- 
lationship of the posterior mean appeared somewhat linear in i, there was 
no relationship with the modeled BOLD response. Most 7jj(i)'s [with the 
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exception of 7i3(t)] were almost invariant with respect to time t, unlike 
the clear oscillatory nature of the time series obtained here using A^^p (°^ 
even A^dp)- The fact that the BOLD response had very little relationship 
with these effective connectivity parameters is perplexing, given that these 
regions were the ones found to be activated in the preprocessing of the fMRI 
dataset. The results on 7jj(i)'s using A^ar were also very surprising: while 
the posterior means oscillated synchronously with x{t) only for the task- 
irrelevant LG with a correlation of 0.943, there was no evidence of nonzero 
values for all the other effective connectivity parameter values (including the 
task-relevant MOG), since their pointwise 95% HPD credible regions all con- 
tained zero for all time t. This is very unlike the results obtained using A^^P' 
which also established the existence of the ACN theory in performing this 
task. Indeed, among all the approaches considered in the literature and here 
on this dataset, only the DP-based analyses have been able to capture both 
the dynamic as well as the oscillatory nature of the effective connectivity pa- 
rameters. In doing so, we also obtain further insight into how an individual 
brain performs a Stroop task. 

5. Conclusions and future work. Effective connectivity analysis provides 
an important approach to understanding the functional organization of the 
human brain. Bhattacharya, Ho and Purkayastha (2006) provide a coherent 
and elegant Bayesian approach to incorporating uncertainty in the analy- 
sis. In this paper we note that this approach also brings forth with it some 
limitations. In this paper we therefore propose a nonstationary and nonpara- 
metric Bayesian approach using a DP-based model that embeds an AR(1) 
process in the class of many possible models. Heuristically, our suggestion 
has some connection with model averaging, where we have, a priori, an 
AR(1) model in mind for specifying dynamic effective connectivity: the DP 
provides a coherent way to formalize our intuition. We have also derived an 
easily implemented Gibbs sampling algorithm for learning about the pos- 
terior distributions of all the unknown quantities. Simulation studies show 
that our model is a better candidate for the analysis of effective connectivity 
in many cases. The advantage is more pronounced with increasing depar- 
tures from stationarity in the true model. We also applied our methodology 
to investigate the feedback mechanisms between the task-irrelevant LG, the 
task-relevant MOG and the "executive control" DLPFC in the context of 
a single-subject Stroop task study. Our results showed strong self- feedback 
for LG and MOG, but not for DLPFC. Further, MOG and DLPFC influence 
LG strongly but the reverse is rather mild. The influence of MOG on DLPFC 
and vice versa are very similar. All these discovered feedback mechanisms 
oscillate strongly in the manner of the BOLD signal and are supportive of 
the framework postulated by ACN theory. Our analysis also provides under- 
standing into the mechanism of how the brain performs a Stroop task. All 
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these are novel findings not reported in the context of fMRI analysis in the 
literature. Thus, adoption of our DP-based approach not only provided in- 
terpretable results, but — as very kindly pointed out by a reviewer — yielded 
additional insights into the workings of the brain. 

There are several aspects of our methodology and analysis that deserve 
further attention. For one, we have investigated ACN in the context of 
a Stroop task for a single male volunteer. It would be of interest to study 
other tasks and responses to other stimuli and also to see how our results 
on a Stroop task translate to multiple subjects and to investigate how these 
mechanisms differ from one person to another. Our modeling approach can 
easily be extended to incorporate such scenarios. Further, our methodology, 
while developed and evaluated in the context of modeling dynamic effective 
connectivity in fMRI datasets, can be applied to other settings also, espe- 
cially in situations where the actual models for the unknowns may be quite 
difficult to specify correctly. Thus, we note that while this paper has made 
an interesting contribution to analyzing dynamic effective connectivity in 
single-subject fMRI datasets, several interesting questions and extensions 
meriting further attention remain. 

SUPPLEMENTARY MATERIAL 

Contents (DOI: 10.1214/11-AOAS470SUPP; .pdf). Section S-1 contains 
additional details regarding our methodology, including explicit forms of 
the full conditional distributions of specific parameters, the configuration 
indicators and the distinct parameters associated with the Dirichlet process 
needed for Gibbs sampling. Detailed arguments that show model averaging 
associated with our DP-based model A^dp are also presented there. Section 
S-2 provides additional information on our simulation experiments, includ- 
ing associated methodology and results. Section S-3 presents further details 
on the analysis of the Stroop task experiment, including display of the data, 
detailed assessment of convergence of our MCMC samplers when using TWdp 
and A^jjp ^^'^ MCMC-based posterior analysis using A^ar and other addi- 
tional models obtained by setting some effective connectivity parameters to 
zero. Additional methodological details and results regarding the smoothing 
of the modeled BOLD signal x{-) are also presented there. 
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